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Abstract 

s. 

The  Paige  style  Lanczos  algorithm  is  an  iterative  method  for  finding  a 
few  eigenvalues  of  large  sparse  symmetric  matrices.  Some  beautiful  rela- 
tionships among  the  elements  of  the  eigenvectors  of  a symmetric  tridiagonal 
matrix  are  used  to  derive  a perverse  starting  vector  which  delays  convergence 
as  long  as  possible.  Why  such  slow  convergence  is  never  seen  in  practice 
is  also  examined. 
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1.  Introduction 


In  1950  Lanczos  [2]  presented  an  algorithm  for  reducing  a symmetric 
matrix,  call  it  A,  to  tridiagonal  form.  The  algorithm  begins  with  an  arbi- 
trary unit  vector  . It  produces  a tridiagonal  matrix  I and  an  ortho- 

gonal matrix  Q such  that 


(D 


Q*AQ  = T and  Qe^  = 


In  practice  the  algorithm  could  not  compete  in  speed  or  accuracy  with  later 
methods  based  on  explicit  orthogonal  transformations. 

In  1971  Paige  [3]  introduced  a modified  version  of  the  algorithm  which 
could  be  used  effectively  to  find  a few  eigenvalues,  and  their  eigenvectors 


too  if  desired,  of  a large  sparse  symmetric  matrix.  Paige  suggested  terminat- 


• th 


ing  the  process  prematurely  at,  say,  the  j step  with  T.  the  j*j  leading 

J 


principal  minor  of  T and  the  first  j columns  of  Q in  hand.  In 


exact  arithmetic 


(2) 


Q AQ.  = T. 
J J J 


Let  the  spectral  decomposition  of  be 


(3) 

Define 


T.  = S.0S^  with  0 = diag(0^',0p, . . . ,0^ ) and  S*S.  = I. 
J J J ^ J J J *3 


Vj  = (y^,yj,...,yj)  = QjS.  . 


Then  6^,0^,...,ej  are  the  Rayleigh-Ritz  approximations  to  the  eigenvalues 
of  A (commonly  called  Ritz  values)  derivable  from  the  subspace  spanned  by 
the  columns  of  Q ^ , and  y^,y^,...,yj  are  the  corresponding  Ritz  vectors. 


The  norm  of  the  residual  of  y^ , namely  II Ay^ -0^y ^ II , is  a bound  on  the 


accuracy  of  0^  as  an  approximation  to  an  eigenvalue  of  A. 


f- 
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The  Kaniel-Paige  error  estimates  [1,3]  lead  us  to  expect  that  some  of 
the  eigenvalues  of  should  converge  (have  negligible  error  bound)  for 
j « n,  provided  only  that  the  starting  vector  is  not  pathologically 
deficient  in  the  corresponding  eigendirections  of  A.  Numerical  tests  by 
Paige  and  other  researchers  have  confirmed  that  convergence  occurs  relatively 
quickly.  Despite  this  abundance  of  evidence,  Paige  was  unable  to  prove  that 
convergence  of  some  Ritz  value  must  occur  before  j = n = dim(A)  at  which 
point,  in  exact  arithmetic,  T is  similar  to  A so  that  all  the  "approxi- 
mations" are  exact  and  all  the  bounds  are  zero. 

There  are  several  interesting  unresolved  problems  connected  with  the 
Lanczos  process.  Except  in  its  last  section,  this  paper  is  restricted  to 
the  theoretical  behavior  of  the  algorithm  in  the  context  of  exact  arithmetic. 

In  the  following  section  we  derive  some  beautiful  relationships  among  the 
elements  of  the  eigenvectors  of  a symmetric  tridiagonal  matrix  which  may  be 
of  interest  in  their  own  right.  In  Section  3 these  results  are  applied  to  obtain 
formulas  for  the  Lanczos  starting  vector.  Ir.  Section  4 these  formulas  are  used 
to  find  a perverse  starting  vector  for  matrices  with  well  separated  eigenvalues 
which  delays  convergence  until  j=n.  Section  5 generalizes  the  construction 
to  matrices  with  close  or  multiple  eigenvalues  to  yield  a vector  which 
delays  convergence  for  a long  time.  The  final  section  will  indicate  why 
such  slow  convergence  is  never  seen  in  practice. 

2 . The  Eigenvectors  of  a Symmetric  Tridiagonal  Matrix 

Definition.  Let  adj(R)  be  the  transpose  of  the  matrix  of  cofactors 
of  R.  This  is  usually  called  the  adjugate  or  classical  adjoint  of  R.  By 
the  Cauchy-Binet  theorem 


(1) 


R adj(R)  = det (R) I . 


r 
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Theorem  2.1  (Thompson  and  McEnteggert,  1968).  Let  A = ZAZ* 
with  A = diag(X1,X2>...,An),  Z = (z1,z2,...,z  ) and  Z*Z  = I. 
Then  for  i = 1 ,2,3, ... ,n 

n 


adj^I-A)  = n (X  -Xi)ziz*  = X^.^z* 


j=l 


where  X^(u)  is  the  derivative  of  the  characteristic  polynomial 
of  A. 


Note  that  if  X.  is  a multiple  eigenvalue  of  A,  then  x'A\.)  = 0,  so 

n 1 


that  the  ambiguity  in  the  choice  of  eigenvectors  doesn't  matter. 


V-l 


Proof.  Let  u f A..,  for  all  i,  so  that  (pI-A)  exists.  Then 


adj(yl-A)  = det(y I-A) (yl-A) 


-1 


= xA(u)z(yl-A)'V 


= ZAZ 
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does  not  involve  division,  adj(R)  is  a continuous  function  of  R.  There- 
fore the  last  equation  must  hold  even  for  u = A.. . Setting  p = for 
i = 1 ,2,3, ... .n,  yields 


adj(A.I-A)  = ZAZ*  , 


where 


if  k f i , 


akk  ■ " <W  ■ n 
J i n 


n (A .-A . ) if  k = i . 
j=l  1 J 
j^i 


Since  n (A. -A.)  = xi(A-),  the  result  follows. 
j=l  3 1 


Thompson  and  McEnteggert  were  working  with  general  Hermitian  matrices. 

The  application  of  their  theorem  to  tridiagonal  matrices  was  made  by  Paige  [3], 


Notation.  Let 


a B 

r r 

er  ar+l  8r+l 
6r+l  ar+2 


o 


o 


Let  Xy.  *(b)  = det(pI-T  .),  the  characteristic  polynomial  of  T 


1et  xr,r-l^  ' 1 for  a11  r- 


and 
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Theorem  2.2  (Paige,  1971).  Let  T = T,  = S0S*  with 

i , n 

0 = diag(0-j  ,02»- • • » Jn)  and  S*S  = I.  Then  for  r < t and  all  i, 
X1  ,n(6i)sristi  = X1  ,r-l  (0i  )3r3r+l  ' "Bt-lxt+l  ,n^°i  * ‘ 


In  particular 


xi,n(0i)sri  = xlIr-l(01)xrfl,n{ei) 


Proof.  By  Theorem  2.1 


(2)  adj^I-T)  = xi^lepSjS*  . 

The  (r,t)  element  of  the  RHS  of  (2)  is  xi  (9.  )s  -s.. . Because  of  the 

IjM  1 II  LI 

tridiagonal  form  of  T,  the  (r,t)  element  of  the  LHS  of  (2)  is 

Xl.r-l(ei>BA+r"6t-lXt*l,n(0i>-  For  examp,e- 


'(55)  ei 

‘B1  V“2  '"2 


8.1  - T,  , 

l 1 ,6 


fBg)  lJi_a3  ~B3 

-63  ^4  "B4 


-B4  Va5  -65  j 

i -b5  VV 


The  circled  elements  contribute  to  the  (2,3)  cofactor.  Note  that  the  minus 
signs  on  the  B's  cancel  with  the  alternating  signs  associated  with  the 
cofactors.  □ 


3 • Formulas  for  Starting  Vectors 

The  Lanczos  algorithm  begins  with  an  arbitrary  unit  vector  and 
terminates  with  a tridiagonal  matrix  T and  an  orthogonal  matrix  Q such 
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that  q-j  is  the  first  column  of  Q and  AQ  = QT.  The  process  is  geometric, 
i.e.  it  is  invariant  under  orthogonal  changes  in  coordinates.  The  coordinates 
which  give  the  most  insight  into  the  process  are  the  eigenvectors  of  A. 

In  these  coordinates  the  operator  A is  diagonal  and  the  matrix  Q 
becomes  the  transpose  of  S,  the  matrix  of  eigenvectors  of  T.  The  matrix 
equation  AQ  = QT  becomes 


AS*  = S*T 


where  A = diag(A^ . . . ,An) . 


Theorem  3.1.  Let  AS*  = S*T  as  above.  Then  for  i 
(i)  siisniXi(^)  * eiB2"‘3n-l  H "n  d constant» 

snixA^V  = xlfn-l^V* 


Proof.  Since  A is  similar  to  T,  X^M  = Xj(v)  = x(u)- 

(i)  This  is  Theorem  2.2  with  r = 1 and  t = n. 

(ii)  This  is  Theorem  2.2  with  r = n and  t = n. 


In  order  to  refer  to  the  Lanczos  vectors,  we  need  names  for  the  columns 
of  S*.  For  this  purpose,  when  A = A,  we  define 

(2)  P = (P] ,p2,. . . ,p( ' S*  . 

Theorem  3.1(i)  relates  the  first  Lanczos  vector  p.|  to  the  last  Lanczos 

vector  pn>  Theorem  3 . 1 ( i i ) relates  pn  to  the  eigenvalues  of  T^  n ^ which 

s t 

are  the  approximations  to  eigenvalues  of  A furnished  by  the  (n-1)  step 
of  the  Lanczos  algorithm.  Since  T1  n is  similar  to  A,  the  Cauchy  Inter- 
lace theorem  requires  that  the  eigenvalues  of  T -j,  call  them  ,p2, . . . >un_i 


I 


8 


satisfy  the  inequalities 


(3) 


X,  £ p-i  £ ^2  — 


< y ■, 

- n-1 


< X 


Since  T,  , the  tridiagonal  matrix  produced  by  the  Lanczos  algorithm,  will 
i , n 

be  unreduced,  the  interlacing  must  be  strict. 


Theorem  3.2.  Let  A = ZAZ*  with  A = di ag( X^ , . . . .X^)  and 
Z*Z  = I.  Then  the  Lanczos  algorithm  run  with  a starting  vector 
q^  produces  a T-|  n ^ with  eigenvalues  b-j  < y->  < * ■ • < yn  ^ if 
and  only  if  q^  = Zp^ , where 


? n n-1 

= /[  n (x.-x.)  n (x.-y .)] 

" j=i  1 J j=i  1 J 


Strict  interlacing  is  required  to  make  the  quantity  in  the  brackets  positive 
for  all  i . 


Proof.  The  Lanczos  algorithm  produces  the  same  T whether  it  runs  on 
the  pair  (A,q^)  or  (A,p^).  Combining  the  two  parts  of  Theorem  3.1  and 
changing  to  the  P notation  yields 

(4)  P1lX'(X,)x, , 


for  any  starting  vector  p^ . 

If  p-j  is  known,  then  by  interpolation,  n j(u)  can  be  found  from 
(4),  up  to  a constant  factor.  Hence  y^ .y^,. . . ,yn_| , the  zeros  of  Xj  n_](y) 
can  be  found. 


If  y1  ,y2, . . . ,un_i  are  given,  then  p^,  for  i = 1,2,, 

2 

found  from  (4)  up  to  the  multiplicative  factor  -rt 


,n,  can  be 


which  can  be  determined 
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by  the  required  normalization  of  . The  ambiguity  in  the  choice  of  sign 
for  each  component  of  p^  merely  reflects  the  choice  of  sign  for  each  eigen- 
vector of  T.  All  choices  yield  the  same  tridiagonal  matrix  T.  □ 

The  required  q^  depends  on  both  the  eigenvalues  and  on  the  eigenvectors 
of  A.  The  expression  q^  = Zp^  clarifies  their  roles;  Z is  independent 
of  the  A.j , while  p^  is  independent  of  Z. 

Example.  Let  A = diag(l ,3, 5, 7, 9)  and  p.  = 2 i , for  i = 1,2, 3, 4. 

X'(l)xyO)  = (1-3)  (1-5)  (1-7)  (1-9)  (1-2)  (1-4)  (1-6)  (1-8)  = 40320 
X'(3)xy(3)  = (3-1 ) (3-5) (3-7) (3-9) (3-2) (3-4) (3-6) (3-8)  = 1440 

X’(5)Xy(5)  = (5-1 ) (5-3) (5-7) (5-9) (5-2) (5-4) (5-6) (5-8)  = 576 

X'(7)xy(7)  = (7-1 ) (7-3) (7-5) (7-9) (7-2) (7-4) (7-6) (7-8)  = 1440 

X’(9)x  (9)  = (9-1 ) (9-3) (9-5) (9-7) (9-2) (9-4) (9-6) (9-8)  = 40320 

so 

Pn  = p^  = tt  *^40320  = .00498t^ 

P21  = P41  = "s^440  = • 02635tt5 

P31  = = .04167tt5  . 

By  normalization  ttc  = 17.749  and 
0 

P1  = (.0880,  .4677,  .7396,  .4677,  .0880)*  . 

The  Lanczos  algorithm  run  on  A with  p^  as  the  starting  vector  yielded 
a T^  with  eigenvalues  2,  4,  6,  and  8 correct  to  the  precision  of  the 


machine  used. 


□ 


4 . Slow  Convergence 
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Before  examining  the  convergence  properties  of  the  Lanczos  algorithm  in 
the  light  of  Theorem  3.2,  it  is  useful  to  describe  in  more  detail  the 
properties  of  the  Rayleigh-Ritz  procedure.  Let  W be  any  subspace  of  Fn 
and  let  Pw  denote  the  orthogonal  projection  of  Fn  onto  W.  Then  the 
Rayleigh-Ritz  approximations  to  eigenpairs  of  A obtained  from  W are 
precisely  the  eigenpairs  of  P^A  whose  eigenvectors  lie  in  W. 


Theorem  4.1.  If  II  cj  then  the  Rayleigh-Ritz  approximations 
for  A obtained  from  V are  the  same  as  the  Rayleigh-Ritz 
approximations  for  P^A  obtained  from  V.  Further  if  (y,0) 
is  a Ritz  pair  then  II (P^A)y  - y0 II  £ II Ay  - yQ II , with  equality 
holding  if  and  only  if  the  residual  vector,  Ay-yQ,  lies  in  W. 


Proof.  Let  Py  be  the  orthogonal  projection  onto  V.  Then 
Pv(PwA)  = (pvpw)A  = pvA,  since  V c W.  Since  y € V C W, 

II  (PwA)y  - Gyll  = II  pw  ( Ay  - y0 ) II . Finally  since  P,,j  is  an  orthogonal  projection, 
II  Pw(Ay-y0)  II  £ II  Ay -yell,  with  equality  holding  if  and  only  if  Ay  - y9  GW.  □ 


Corollary.  If  V and  W are  nested  Krylov  subspaces  of 
different  dimensions,  then  HAy-yOll  = IIP^Ay-yell. 

Proof.  If  V = Kj(q^)  and  W = q1 ) for  k > j,  then,  since 
y S K.(q1 ),  Ay  - ye  G Kj+1 (q] ) C KR(q1 ) = W. 

We  now  examine  the  convergence  properties  of  the  Lanczos  algorithm. 


The  reader  is  directed  to  Section  1 for  the  terminology. 
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Theorem  4.2. 

Suppose  that 

the  Lanczos  algorithm  when  run  on 

r — 

O" 

#> 

<c 

produces 

, ,u  . as  Ritz  values  at  the  n-lst 
’ n-1 

step. 

Let 

(y,e)  be  a Ritz  pair  from  any  step  except  the  n* 

Then 

Y = 

II  Ay -yell  > 6^/2  , 

where 

6 = 
U 

min  |p.-A, | . 
i <n- 1 1 k 

k<n 

Proof. 

For 

any  vector  x 

and  any  scalar  t it  is  well  known 

(5)  min  j X . - t | < II Ax-xx II  . 

i<n 

In  particular, 

(6)  min  | X . -0 j < li Ay-y0 II  = y , and 

i<n 

(7)  min  |p.-0|  < IIPwAy-y0ll  = y , 


with  the  last  equality  following  from  the  Corollary.  The  smallest  y which 
can  satisfy  both  (6)  and  (7)  is  6^/2.  E 

In  practice  y can  not  be  as  small  as  6^/2 . However  no  significantly 
stronger  bound  can  be  obtained.  In  particular,  the  smallest  residual  at  the 

c f 

n-1  step  can  be  6 . 

u 

The  combination  of  Theorem  3.2  and  Theorem  4.2  yields  the  following. 


Theorem  4.3.  Let  A be  a symmetric  matrix  with  eigenvalues 

A.  < x < < A . Let  6.  = min  | A - -X.  |.  Then  there  exists  a 

1 d n A i^k  1 K 

starting  vector  for  the  Lanczos  algorithm  such  that  the  residual 

norm  of  any  Ritz  vector  at  any  step  j < n will  be  larger  than  5^/4. 
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Proof.  If  A has  multiple  eigenvalues  then  6^  = 0 and  any  vector 

will  do,  so  we  may  assume  that  A has  distinct  eigenvalues.  Let 

y..  = (\^+X^+^)/2,  for  i = l,2,...,n-l,  and  let  be  any  starting  vector 

generated  by  Theorem  3.2.  With  this  choice  of  ,0^.  • • • >v*n  -|  will  be 

the  Ritz  values  at  the  n-ls*"  step  and  6 = 6»/2.  The  result  now  follows 

y A 

from  Theorem  4.2.  □ 

If  the  spectrum  of  A is  such  that  6^/4  is  larger  than  some  given 
convergence  tolerance,  then  Theorem  4.3  shows  that  there  exist  perverse 
starting  vectors  which  delay  convergence  until  the  n*'*1  step.  This  result 
does  not  imply  that  no  earlier  Ritz  value  is  accurate  enough,  it  only 
guarantees  that  the  corresponding  bound  will  not  reveal  such  accuracy.  In 
the  previous  example  of  A = di ag ( 1 ,3, 5, 7, 9)  and  y.  = 2i,  for  i = 1,2, 3, 4, 

3 

the  middle  eigenvalue  of  T^  ^ is  5,  correct  to  working  accuracy.  The 
corresponding  bound  is  1.25,  which  shows  that  this  fortuitous  accuracy  is 
due  to  the  symmetry  of  the  example,  rather  than  the  accuracy  of  the  Ritz 
vector. 


5 . The  Problem  of  Clustered  Eigenvalues 

If  the  spectrum  of  A is  such  that  5^/4  is  smaller  than  the  given 
convergence  tolerance,  Theorem  4.3  does  not  guarantee  slow  convergence. 
However  starting  vectors  can  still  be  found  which  delay  convergence  a long 


time. 
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Theorem  5.1.  Let  W be  an  A-invariant  subspace  of  maximal  dimen- 
sion such  that  A e A restricted  to  W is  such  that  $;/ 4 is 
larger  than  the  given  convergence  tolerance.  Let  m = dim  W. 

Then  there  exists  a starting  vector  for  A which  delays  conver- 

XL 

gence  until  the  mLn  step. 


Proof.  Apply  Theorem  4.3  to  A to  yield  a starting  vector  . Since 
the  Lanczos  algorithm  run  on  (A,q^)  yields  the  same  T as  the  Lanczos 


algorithm  run  on  (A,q^),  this  q^  will  do. 


In  general,  it  may  be  possible  to  delay  convergence  even  longer. 


b . The  Denefici a 1_ Effects  of  Rounding  Errors 

The  slew  convergence  discussed  in  the  previous  sections  never  seems  to 
occur  in  practice.  The  reason  for  this  lies  in  the  formula  for  p^  given 


in  Theorem  3.2, 


pil  ' "n^VXl.n-l^i^ 


First  we  give  three  examples. 


Example  1.  Linear  distribution. 


A.  = i 


(VXi+1}  _ , 1 

S 1+0 


for  i = 1 ,2, ... ,50 
for  i = 1,2,..., 49  . 


L 


p^  was  computed  by  Theorem  3.2.  p1  is  symmetric  from  top  to  bottom.  The 
largest  elements  of  p^  are  p^  -j  = P20  -j  = .397.  The  smallest  elements 
of  P]  are  p^  = P50J  = .25xlO-14. 


Example  2.  Geometric  distribution. 


* (1.1)’ 

“+» 

O 

-5 

II 

ro 

. ,50 

<w,> 

for  i = 1 ,2,. . 

.,49 

yi  2 

The  largest  element  of  is  Pg  ^ = .495.  The  smallest  element  of  p^ 
is  P5o,,  ■ .162  « 10-52. 


Example  3.  Tchebychev  distribution. 


X.  = 


yi 


cos(H) 

<VW 


for  i = 1 ,2, ...  ,50 
for  i = 1 ,2, ... ,49 


The  largest  elements  of  p^  are  p^  -j  = P26  -|  = .228.  The  smallest  elements 

_3 

of  p1  are  P]  ] = P50  1 = -642*10  • 

The  tiny  elements  of  in  Examples  1 and  2 are  due  to  the  large  varia- 
tion in  magnitudes  of  the  numbers  { x ’ ) I k = 1 ,2 , . . . ,n} . Most  practical 
examples  also  show  large  variations,  which  leads  to  a perverse  starting 
vector  with  some  tiny  eigencomponents.  Such  tiny  components  are  unlikely 
to  appear  in  a randomly  chosen  vector.  More  importantly,  such  tiny  components 
are  unstable  in  the  face  of  rounding  errors. 

By  standard  rounding  error  analysis 


(2) 


q]  = Zp1  + f , I!  f II  < n3/2e 


where  e is  the  relative  machine  precision.  In  exact  arithmetic, 


(3) 


« P-|  + z*f  . 
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Unless  Z has  some  special  symmetry,  the  term  Z*f  will  swamp  any  tiny 
component  of  p^ . That  is,  q^  is  unlikely  to  have  any  eigencomponents 
much  smaller  than  e.  Even  if  q-j  had  precisely  the  eigencomponents  desired, 
the  first  step  of  the  Lanczos  algorithm  would  obliterate  the  small  components 
unless  the  example  were  specially  rigged.  The  first  step  of  the  algorithm 
computes  q^  as 

(4)  3^2  = (A-a1I)q1+f  , where  II f II  _<  n3^2ellA||  . 

Again  f will  be  randomly  distributed  among  the  various  eigendirections 
and  will  prevent  q2  from  inheriting  any  tiny  components  from  q^ . 

We  have  been  able  to  observe  delayed  convergence  for  large  matrices 
only  for  two  classes.  Tchebychev  distributions  come  very  close  to  minimizing 
the  variation  in  x-j  nU.j)-  Therefore  Tchebychev  distributions  (even  on 
fairly  large  problems)  do  not  have  components  of  p^  smaller  than  e.  The 
other  class  of  examples  is  diagonal  matrices  in  which  the  rounding  errors 
are  uncoupled  with  tiny  elements  of  p^  (which  is  q^)  are  not  swamped  by 
small  multiples  of  much  larger  elements. 
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